## Author: Kiril Boyanov (kirilboyanov [at] gmail.com)
## LinkedIn: www.linkedin.com/kirilboyanov/
## Last update: 2023-12-08
In this file, we explore the correlations between happiness and different economic, political, societal, environmental and health-related factors. We do this both for the most recent year in the data and for all available years in the data. Finally, we explore explore how the correlations have evolved throughout time.
Importing relevant packages, defining custom functions, specifying local folders etc.
# Importing relevant packages
# For general data-related tasks
library(plyr)
library(tidyverse)
library(data.table)
library(openxlsx)
library(readxl)
library(arrow)
library(zoo)
# For working with countries
library(countrycode)
# For statistical analysis
library(corrr)
# For data visualization
library(ggplot2)
library(plotly)
library(rjson)
Throughout the analysis, we will be using a common
BaseYear (to represent the past state of happiness) and a
common ReferenceYear (to represent the most recent state of
happiness). To ensure consistency across files, these two years are
stored in a TXT file, which is imported below.
Thus, we use the following years as base and reference:
## Base year: 2005
## Reference year: 2022
We import data that was already pre-processed in the
WHR_data_prep.Rmd notebook and that was subjected to
missing data imputation in the
Dealing_with_missing_data.Rmd notebook. A preview of the
data imported is shown below:
Please note that as we have two different measures of GDP included in the data, we’re dropping the one that is based on current prices so as not to overestimate the importance of GDP.
We start our analysis by exploring how much different factors are correlated to countries’ annual happiness scores. We calculate the Pearson correlations for all variables in the data and we sort the factors with the highest absolute correlation on top. A preview of the top 20 strongest correlations is shown below:
The factors with the strongest correlation to happiness in 2022 were health expenditure and GDP per capita as well as government effectiveness and poverty headcount ratio (estimated at 6.85 USD a day). As we can see, several different categories appear on the list of the strongest correlates, with economical and political factors seemingly being the most important. Together, these two categories stand for 70% of the top 20 most strongly correlated factors:
Conversely, among the least important factors (shown below), we find indicators such as land area, proportion of the population aged 65+, military expenditure as % of GDP and CO2 emissions:
Grouping the most unrelated factors into categories, we see that the picture is a lot more mixed, with social, environmental and health-related factors being more likely to be unrelated to happiness: